Speculations: Providing Fault-tolerance and Recoverability in Distributed Environments

نویسندگان

  • Cristian Tapus
  • Jason Hickey
چکیده

Building safe and reliable programs is an important but difficult endeavor. The challenge is even greater in the context of distributed environments, which may involve complex synchronization operations in the presence of process and network failures. Transactions are one of the earliest and simplest abstractions for reliable concurrent programming [2]. They provide fault-isolation by guaranteeing the atomicity, the consistency and the durability of the actions performed as part of the transaction. Traditional transactions also provide isolation, which prevents the independent actions inside of a transaction from being visible to the outside world until the transaction either aborts or commits. In this paper we consider the case where multiple processes may cooperate in a transaction, using message passing for communication. We relax the transactional isolation property to permit inter-process communication while executing inside the transaction. This model can improve performance and provide fault-tolerance for distributed applications. We call these transactions with relaxed isolation speculations, and we introduce them as programming language primitives. Traditional checkpointing and rollback mechanisms used to provide recoverability are also similar to our approach. However, there are a few differences, as follows. Speculations can provide programs with alternate execution paths upon rollback. Speculations are lightweight checkpoints that are stored in memory and can be coupled with real checkpointing mechanism for increased reliability. Speculations are exposed as programming language primitives that have a semantics closer to that of transactions than that of checkpoints. Our system adapts mechanisms designed for checkpointing/rollback systems [1] to ensure safe recovery lines in case of distributed speculation rollback. While speculations are similar to the concept of lookahead-rollback introduced by the TimeWarp [3] mechanism, we extend the concept by allowing both explicit and implicit speculations through programming language extensions. The main contributions of this paper include: (1) the introduction of a new programming model based on speculations, (2) the definition of new speculative programming language constructs for distributed applications, (3) the description of a prototype implementation of speculations in the Linux kernel where speculative operations, including distributed commit and rollback, are transparent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HotDep ’06: Second Workshop on Hot Topics in System Dependability

S) Summarized by Geoffrey Lefebvre Making Exception Handling Work Bruno Cabral and Paulo Marques, University of Coimbra, Portugal Presented by Bruno Cabrel Exceptions are the standard mechanism for error handling in modern programming languages. Unfortunately, dealing with exceptions is a tedious process. Programmers often avoid the issue by writing empty handlers to save time. Programmers who ...

متن کامل

Distributed Speculations: Providing Fault-tolerance and Improving Performance

This thesis introduces a new programming model based on speculative execution and it examines the use of speculations, a form of distributed transactions, for improving the performance, reliability and fault tolerance of distributed systems. A speculation is defined as a computation that is based on an assumption that is not validated before the computation is started. If the assumption is late...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Recovery with limited replay: fault-tolerant processes in Linda

Research in the area of fault-tolerant distributed systems has focused to a large extent on data surviving various forms of failure. The replica control algorithms for maintaining mutually consistent replicas abound in number. However, comparatively little work has been devoted to making processes recoverable. In domains other than databases and transaction processing, faulttolerance generally ...

متن کامل

A Theory of Nested Speculative Execution

Implementing distributed applications is a challenging task. Developers of such systems are confronted with issues like fault-tolerance, efficient synchronization mechanisms, and the correctness of the distributed code. This paper introduces a new programming model based on speculative execution that addresses these issues. Speculations provide distributed atomic rollback and enable optimistic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006